Search CORE

10 research outputs found

Estimating Dependency, Monitoring and Knowledge Discovery in High-Dimensional Data Streams

Author: Fouché Edouard
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 07/12/2020
Field of study

Data Mining – known as the process of extracting knowledge from massive data sets – leads to phenomenal impacts on our society, and now affects nearly every aspect of our lives: from the layout in our local grocery store, to the ads and product recommendations we receive, the availability of treatments for common diseases, the prevention of crime, or the efficiency of industrial production processes. However, Data Mining remains difficult when (1) data is high-dimensional, i.e., has many attributes, and when (2) data comes as a stream. Extracting knowledge from high-dimensional data streams is impractical because one must cope with two orthogonal sets of challenges. On the one hand, the effects of the so-called "curse of dimensionality" bog down the performance of statistical methods and yield to increasingly complex Data Mining problems. On the other hand, the statistical properties of data streams may evolve in unexpected ways, a phenomenon known in the community as "concept drift". Thus, one needs to update their knowledge about data over time, i.e., to monitor the stream. While previous work addresses high-dimensional data sets and data streams to some extent, the intersection of both has received much less attention. Nevertheless, extracting knowledge in this setting is advantageous for many industrial applications: identifying patterns from high-dimensional data streams in real-time may lead to larger production volumes, or reduce operational costs. The goal of this dissertation is to bridge this gap. We first focus on dependency estimation, a fundamental task of Data Mining. Typically, one estimates dependency by quantifying the strength of statistical relationships. We identify the requirements for dependency estimation in high-dimensional data streams and propose a new estimation framework, Monte Carlo Dependency Estimation (MCDE), that fulfils them all. We show that MCDE leads to efficient dependency monitoring. Then, we generalise the task of monitoring by introducing the Scaling Multi-Armed Bandit (S-MAB) algorithms, extending the Multi-Armed Bandit (MAB) model. We show that our algorithms can efficiently monitor statistics by leveraging user-specific criteria. Finally, we describe applications of our contributions to Knowledge Discovery. We propose an algorithm, Streaming Greedy Maximum Random Deviation (SGMRD), which exploits our new methods to extract patterns, e.g., outliers, in high-dimensional data streams. Also, we present a new approach, that we name kj-Nearest Neighbours (kj-NN), to detect outlying documents within massive text corpora. We support our algorithmic contributions with theoretical guarantees, as well as extensive experiments against both synthetic and real-world data. We demonstrate the benefits of our methods against real-world use cases. Overall, this dissertation establishes fundamental tools for Knowledge Discovery in high-dimensional data streams, which help with many applications in the industry, e.g., anomaly detection, or predictive maintenance. To facilitate the application of our results and future research, we publicly release our implementations, experiments, and benchmark data via open-source platforms

KITopen

Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits

Author: Fouché Edouard
Honda Junya
Komiyama Junpei
Publication venue
Publication date: 23/07/2021
Field of study

We consider nonstationary multi-armed bandit problems where the model parameters of the arms change over time. We introduce the adaptive resetting bandit (ADR-bandit), which is a class of bandit algorithms that leverages adaptive windowing techniques from the data stream community. We first provide new guarantees on the quality of estimators resulting from adaptive windowing techniques, which are of independent interest in the data mining community. Furthermore, we conduct a finite-time analysis of ADR-bandit in two typical environments: an abrupt environment where changes occur instantaneously and a gradual environment where changes occur progressively. We demonstrate that ADR-bandit has nearly optimal performance when the abrupt or global changes occur in a coordinated manner that we call global changes. We demonstrate that forced exploration is unnecessary when we restrict the interest to the global changes. Unlike the existing nonstationary bandit algorithms, ADR-bandit has optimal performance in stationary environments as well as nonstationary environments with global changes. Our experiments show that the proposed algorithms outperform the existing approaches in synthetic and real-world environments

arXiv.org e-Print Archive

Budgeted Multi-Armed Bandits with Asymmetric Confidence Intervals

Author: Arzamasov Vadim
Böhm Klemens
Fouché Edouard
Heyden Marco
Publication venue
Publication date: 15/08/2023
Field of study

We study the stochastic Budgeted Multi-Armed Bandit (MAB) problem, where a player chooses from

K

arms with unknown expected rewards and costs. The goal is to maximize the total reward under a budget constraint. A player thus seeks to choose the arm with the highest reward-cost ratio as often as possible. Current state-of-the-art policies for this problem have several issues, which we illustrate. To overcome them, we propose a new upper confidence bound (UCB) sampling policy,

\omega

-UCB, that uses asymmetric confidence intervals. These intervals scale with the distance between the sample mean and the bounds of a random variable, yielding a more accurate and tight estimation of the reward-cost ratio compared to our competitors. We show that our approach has logarithmic regret and consistently outperforms existing policies in synthetic and real settings

arXiv.org e-Print Archive

A framework for dependency estimation in heterogeneous data streams

Author: Böhm Klemens
Fouché Edouard
Kalinke Florian
Mazankiewicz Alan
Publication venue: Springer
Publication date: 02/07/2020
Field of study

Estimating dependencies from data is a fundamental task of Knowledge Discovery. Identifying the relevant variables leads to a better understanding of data and improves both the runtime and the outcomes of downstream Data Mining tasks. Dependency estimation from static numerical data has received much attention. However, real-world data often occurs as heterogeneous data streams: On the one hand, data is collected online and is virtually infinite. On the other hand, the various components of a stream may be of different types, e.g., numerical, ordinal or categorical. For this setting, we propose Monte Carlo Dependency Estimation (MCDE), a framework that quantifies multivariate dependency as the average statistical discrepancy between marginal and conditional distributions, via Monte Carlo simulations. MCDE handles heterogeneity by leveraging three statistical tests: the Mann–Whitney U, the Kolmogorov–Smirnov and the Chi-Squared test. We demonstrate that MCDE goes beyond the state of the art regarding dependency estimation by meeting a broad set of requirements. Finally, we show with a real-world use case that MCDE can discover useful patterns in heterogeneous data streams

KITopen

Adaptive Bernstein Change Detector for High-Dimensional Data Streams

Author: Arzamasov Vadim
Böhm Klemens
Fenn Tanja
Fouché Edouard
Heyden Marco
Kalinke Florian
Publication venue
Publication date: 22/06/2023
Field of study

Change detection is of fundamental importance when analyzing data streams. Detecting changes both quickly and accurately enables monitoring and prediction systems to react, e.g., by issuing an alarm or by updating a learning algorithm. However, detecting changes is challenging when observations are high-dimensional. In high-dimensional data, change detectors should not only be able to identify when changes happen, but also in which subspace they occur. Ideally, one should also quantify how severe they are. Our approach, ABCD, has these properties. ABCD learns an encoder-decoder model and monitors its accuracy over a window of adaptive size. ABCD derives a change score based on Bernstein's inequality to detect deviations in terms of accuracy, which indicate changes. Our experiments demonstrate that ABCD outperforms its best competitor by at least 8% and up to 23% in F1-score on average. It can also accurately estimate changes' subspace, together with a severity measure that correlates with the ground truth

arXiv.org e-Print Archive

Scalable Online Change Detection for High-dimensional Data Streams

Author: Böhm Klemens
Fouché Edouard
Heyden Marco
Kalinke Florian
Publication venue
Publication date: 25/05/2022
Field of study

Detecting changes in data streams is a core objective in their analysis and has applications in, say, predictive maintenance, fraud detection, and medicine. A principled approach to detect changes is to compare distributions observed within the stream to each other. However, data streams often are high-dimensional, and changes can be complex, e.g., only manifest themselves in higher moments. The streaming setting also imposes heavy memory and computation restrictions. We propose an algorithm, Maximum Mean Discrepancy Adaptive Windowing (MMDAW), which leverages the well-known Maximum Mean Discrepancy (MMD) two-sample test, and facilitates its efficient online computation on windows whose size it flexibly adapts. As MMD is sensitive to any change in the underlying distribution, our algorithm is a general-purpose non-parametric change detector that fulfills the requirements imposed by the streaming setting. Our experiments show that MMDAW achieves better detection quality than state-of-the-art competitors

arXiv.org e-Print Archive

Tandem Outlier Detectors for Decentralized Data

Author: Fouché Edouard
Gwosch Thomas
Heyden Marco
Matthiesen Sven
Thoma Steffen
Wilwer Jürgen
Publication venue: Association for Computing Machinery
Publication date: 24/08/2023
Field of study

KITopen

Biological and environmental influence on tissue fatty acid compositions in wild tropical tunas

Author: Amiel Aurélien
Bodin Nathalie
Debrauwer Laurent
Fouché Edwin
Kraffe Edouard
Ménard Frédéric
Sardenne Fany
Publication venue: 'Elsevier BV'
Publication date: 01/02/2017
Field of study

International audienceThis study examined the fatty acid composition of three sympatric tropical tuna species (bigeye Thunnus obesus, yellowfin T. albacares and skipjack tuna Kastuwonus pelamis) sampled in the Western Indian Ocean in 2013. The fatty acid compositions of neutral and polar lipids, respectively involved in energy storage and cell membrane structure, were explored and compared in four tissues (red and white muscles, liver and gonads), according to biological (size, sex and maturity) and environmental (season and area) factors. The liver and the red muscle were the fattest tissues (i.e., higher levels of storage lipids) in all species and polar lipids were the lowest in the white muscle. Species and tissue types explained most differences in fatty acid compositions, while environmental factors had limited effects, except in the hepatic cell membrane where fatty acid composition varied with monsoons. Docosahexaenoic acid (22:6n-3) was the major fatty acid in both polar and neutral lipid fractions, especially in muscles. Eicosapentaenoic acid (20:5n-3) and oleic acid (18:1n-9) were in higher proportion in neutral than in polar lipids. Arachidonic acid (20:4n-6) and 22:6n-3, together with docosapentaenoic acid (22:5n-6) and stearic acid (18:0), showed preferential accumulation in polar lipids. 20:4n-6 was particularly involved in cell membranes of ovary and white muscle. Overall, an important inter-individual variability in fatty acid compositions of structural lipids was found within tissue types despite considering biological factors that are most likely to influence this type of lipids. It suggests that fatty acid profiles are influenced by individual-specific behaviors

HAL AMU

HAL-INSU

HAL-Université de Bretagne Occidentale

HAL Descartes

HAL-INSA Toulouse

Biological and environmental influence on tissue fatty acid compositions in wild tropical tunas

Author: Anholt
Arnold
Arts
Aurélien Amiel
Bell
Bodin
Budge
Cejas
Dalsgaard
Del Raye
Dickson
Edouard Kraffe
Edwin Fouché
Fany Sardenne
Farley
Frédéric Ménard
Fuller
George
Graham
Graham
Grande
Hazel
Howell
Ishihara
Juan-Jordá
Koussoroplis
Koven
Lands
Laurent Debrauwer
Li
Litzow
Longhurst
Martin
McMurchie
Medina
Morais
Mourente
Mourente
Nathalie Bodin
Ortega
Parrish
Parrish
Parrish
Peng
Regost
Robin
Saito
Sardenne
Sardenne
Schaefer
Schaefer
Schaefer
Scholefield
Schott
Selmi
Sorbera
Swanson
Timohina
Tocher
Tocher
Werbrouck
Wiegand
Zudaire
Zudaire
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Why mâle

Author: Alessio Giovanni.
Allen W. Sidney.
Alvar Manuel
Bloch Oscar
Bourciez Edouard
Burr Isolde.
Charles Guerlin de Guer
Cotgrave Randle.
Daniel Recasens i Vives
Dauby Jean.
Dauzat Albert.
de la Chaussée François.
de la Chaussée François.
DEAF
Diez Friedrich.
Elcock William D.
Ernout Alfred
Flutre Louis-Fernand.
Fouché Pierre.
Fouché Pierre.
Frere Sheppard.
Gaston Zink
Gilliéron Jules
Goetz Georg
Guinet Louis.
Hall Robert A.
Hatzfeld Helmut
Hualde José Ignacio.
James Edward.
Le Blant Edmond F.
Loporcaro Michele.
Lutta C. Martin.
Molho Mauricio.
Morin Yves Charles.
Morlet Marie-Thérèse.
Nyrop Kristoffer.
Oudin Antoine.
Palsgrave John.
Peer Oscar.
Prou Maurice.
Regula Moritz.
Repetti Lori
Rey Alain
Rheinfelder Hans.
Rohlfs Gerhard.
Sampson Rodney.
Sampson Rodney.
Scheer Tobias
Schwan Eduard
Simoni-Aurembou Marie-Rose.
Steriade Donca.
Theo Vennemann
Thurot Charles.
Timpanaro Sebastiano.
Tomás Navarro Tomás
Trévoux
Vielliard Jeanne.
Väänänen Veikko.
Walther von Wartburg
wilhelm meyer-lübke
Witold Man´czak
Wüest Jakob.
Publication venue: 'Brepols Publishers NV'
Publication date
Field of study

Crossref